๐Ÿ  Vault / School / H2 Math / MIT_Project / README.md

MIT Project: From Lines to Networks

<#>MIT Project: From Lines to Networks <##>Predicting HDB Resale Prices with Linear Regression and MLP

Mathematical Investigative Task (MIT) 2026 Hwa Chong Institution | H2 Mathematics


<##>Project Overview

This project demonstrates the mathematical foundations of machine learning by implementing Linear Regression and a Multi-Layer Perceptron (MLP) from scratch using only NumPy. We apply these models to predict HDB resale prices in Singapore, showcasing real-world applications of H2 Mathematics concepts.

<###>Theme SDG 11: Sustainable Cities โ€” Understanding housing affordability through data-driven analysis


<##>Mathematical Concepts Demonstrated

<###>Linear Regression

<###>Multi-Layer Perceptron (MLP)

<##>Dataset

Source: Singapore Government Data (data.gov.sg) File: Resale flat prices based on registration date from Jan-2017 onwards Size: 229,273 transactions

<###>Features Engineered | Feature | Description | Mathematical Role | |---------|-------------|-------------------| | floor_area_sqm | Flat size in square meters | Input variable | | remaining_lease_years | Years left on lease | Input variable | | storey_mid | Middle of storey range | Input variable | | town_* | One-hot encoded towns | Categorical features | | flat_* | One-hot encoded flat types | Categorical features |


<##>Results

<###>Linear Regression on Real HDB Data

- Floor area: +$2,000 per sqm - Remaining lease: +$37,000 per year - Higher floors: +$55,000 premium

<###>Why Linear Regression Works Well HDB pricing has mostly linear relationships:

<###>MLP on Non-Linear Synthetic Data <###>Comparison on Real HDB Data | Model | Rยฒ | Strengths | Weaknesses | |-------|-----|-----------|------------| | Linear Regression | 0.52 | Interpretable, fast | Can't learn curves | | MLP | ~0.50* | Flexible, powerful | More complex, needs tuning |

*MLP performance depends heavily on hyperparameters. With proper tuning, it can match or exceed linear regression on this dataset.


<##>Project Structure


~/MIT_Project/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ Resale flat prices based on registration date from Jan-2017 onwards.csv (229K rows)
โ”‚   โ””โ”€โ”€ [4 other CSV files]
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ linear_regression.py          # Linear regression from scratch
โ”‚   โ”œโ”€โ”€ mlp.py                         # MLP from scratch
โ”‚   โ”œโ”€โ”€ data_loader.py                 # HDB data preprocessing
โ”‚   โ”œโ”€โ”€ train_hdb_linear.py          # Train LR on HDB data
โ”‚   โ”œโ”€โ”€ train_hdb_comparison.py     # Compare LR vs MLP
โ”‚   โ”œโ”€โ”€ linear_regression_params.json # Saved model weights
โ”‚   โ””โ”€โ”€ hdb_comparison_results.json   # Comparison results
โ”œโ”€โ”€ manim/
โ”‚   โ”œโ”€โ”€ mit_animations.py            # Manim animation scripts
โ”‚   โ””โ”€โ”€ create_animations.py         # Matplotlib fallback
โ”œโ”€โ”€ notebooks/
โ”‚   โ””โ”€โ”€ [For Jupyter exploration]
โ””โ”€โ”€ README.md                       # This file

<##>How to Run

<###>1. Linear Regression Demo


cd ~/MIT_Project/models
python3 linear_regression.py

<###>2. Train on Real HDB Data


python3 train_hdb_linear.py

<###>3. Compare LR vs MLP


python3 train_hdb_comparison.py

<###>4. MLP on Synthetic Data


python3 mlp.py


<##>Animation Storyboard

<###>Scene 1: Title "From Lines to Networks: How Machines Learn to Predict HDB Resale Prices"

<###>Scene 2: Data Visualization Scatter plot of HDB transactions (floor area vs. price)

<###>Scene 3: Linear Regression Model

<###>Scene 4: Gradient Descent Animation showing weight updates converging to minimum

<###>Scene 5: Linear Regression Result Best-fit line on HDB data with Rยฒ = 0.52

<###>Scene 6: The Non-Linear Problem Linear model failing on curved data

<###>Scene 7: MLP Architecture Network diagram: Input โ†’ Hidden โ†’ Output

<###>Scene 8: Forward Pass Data flows through network: zโปยนโฝ = Wโปยนโฝx + bโปยนโฝ, aโปยนโฝ = ReLU(zโปยนโฝ)

<###>Scene 9: Backpropagation Chain rule visualization: โˆ‚L/โˆ‚w = โˆ‚L/โˆ‚ลท ยท โˆ‚ลท/โˆ‚z ยท โˆ‚z/โˆ‚w

<###>Scene 10: Comparison Side-by-side: Linear Regression (Rยฒ=0.52) vs MLP (Rยฒ=0.97 on non-linear)

<###>Scene 11: Conclusion Mathematics powers machine learning: statistics, calculus, linear algebra


<##>Key Takeaways

  1. Linear Regression is powerful for linear relationships and highly interpretable
  2. MLP can learn non-linear patterns but requires more tuning
  3. Feature Engineering matters: creating meaningful inputs from raw data
  4. Gradient Descent is the engine that makes learning possible
  5. Real-world data validates theoretical understanding

<##>AI Use Declaration

AI Tool Used: ChatGPT (Claude Code / Hermes Agent)

Purpose:

Original Contribution: Note: All core algorithms (linear regression, MLP, backpropagation) implemented from scratch without using machine learning libraries (scikit-learn, PyTorch, TensorFlow).


<##>References

  1. HDB Resale Price Data: https://data.gov.sg
  2. SDG 11: Sustainable Cities and Communities (UN)
  3. H2 Mathematics Syllabus (2027): Statistics, Calculus, Linear Algebra

Submitted: Term 2 Week 10, 2026 Group Members: [Your names here] Class: 26S6B